In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).
For an intended output and a classifier score , the hinge loss of the prediction is defined as
Note that should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, , where are the parameters of the hyperplane and is the input variable(s).
When and have the same sign (meaning predicts the right class) and , the hinge loss . When they have opposite signs, increases linearly with , and similarly if , even if it has the same sign (correct prediction, but not by enough margin).
The Hinge loss is not a proper scoring rule.
where is the target label, and are the model parameters.
Weston and Watkins provided a similar definition, but with a sum rather than a max:
In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where denotes the SVM's parameters, the SVM's predictions, the joint feature function, and the Hamming loss:
& = \max(0, \max_{y \in \mathcal{Y}} \left( \Delta(\mathbf{y}, \mathbf{t}) + \langle \mathbf{w}, \phi(\mathbf{x}, \mathbf{y}) \rangle \right) - \langle \mathbf{w}, \phi(\mathbf{x}, \mathbf{t}) \rangle)\end{align}.
-t \cdot x_i & \text{if } t \cdot y < 1, \\ 0 & \text{otherwise}.\end{cases}
However, since the derivative of the hinge loss at is undefined, Smoothness versions may be preferred for optimization, such as Rennie and Srebro's
or the quadratically smoothed
suggested by Zhang. The modified Huber loss is a special case of this loss function with , specifically .
|
|